You can also slice data using conditions based on the value of one or more columns.
The structure of the command line is: DataFrame[Condition]
data[data["Specialisation"] == "Health and Medicine"]
data[data["Enrolled _ UnderGraduate"] >= 1000]
data[data["Enrolled _ UnderGraduate"] >= data["Enrolled _ UnderGraduate"].mean()]
To slice data using more than one condition, you need to put each condition inside a bracket ( ). You also need to use a logical operator. The symbol & is used for and. The symbol | is used for or.
The structure of the command line is:
(data["Specialisation"] == "Law") | (data["Specialisation"] == "Humanities")
(data[Enrolled _ UnderGraduate"] >= 500) & (data["Enrolled _ UnderGraduate"] <= 800)
Sorting data is a technique that display data in an ascending or descending order.
data.sort_values(['CoulmnName'], ascending=False)
data.sort_values('Enrolled _ UnderGraduate')
data.sort_values('Enrolled _ UnderGraduate', ascending = False)
To sort data based on more than one column, you need to include all columns inside the square bracket.
data.sort_values(['Specialisation','Enrolled _ UnderGraduate'],ascending=[True,False])
Grouping data by columns is used when you have duplicated data in a particular column. For example you, the same student is doing more than one course and has grades in each course. The function needed for this operation is .groupby
dataframe.groupby(['Column1', 'Column2']).mean()
data.groupby('Specialization') [ ['Total Enrolled']].sum()
data.groupby(['Higher Education Institution', 'Specialization')] [ ['Total Enrolled']].sum()